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ABSTRACT 


The  application  of  multiscale  and  stochastic  techniques  to  the  solution  of  linear  inverse  problems  is  presented. 
This  approach  allows  for  the  explicit  and  easy  handling  of  a  variety  of  difficulties  commonly  associated  with 
problems  of  this  type.  Regularization  is  accomplished  via  the  incorporation  of  prior  information  in  the  form  of  a 
multiscale  stochastic  model.  We  introduce  the  relative  error  covariance  matrix  (RECM)  as  a  tool  for  quantitatively 
evaluating  the  manner  in  which  data  contributes  to  the  structure  of  a  reconstruction.  In  particular,  the  use  of 
a  scale  space  formulation  is  ideally  suited  to  the  fusion  of  data  from  several  sensors  with  differing  resolutions 
and  spatial  coverage  (eg.  sparse  or  limited  availability).  Moreover,  the  RECM  both  provides  us  with  an  ideal 
tool  for  understanding  and  analyzing  the  process  of  multisensor  fusion  and  allows  us  to  define  the  space-varying 
optimal  scale  for  reconstruction  as  a  function  of  the  nature  (resolution,  quality,  and  coverage)  of  the  available 
data.  Examples  of  our  multiscale  m2udmum  a  posteriori  inversion  algorithm  are  demonstrated  using  a  two  channel 
deconvolution  problem. 


1  INTRODUCTION 


The  objective  of  a  linear  inverse  problem  is  the  recovery  of  an  underlying  quantity  given  a  collection  of  noisy, 
linear  function2Js  of  this  unknown.  This  type  of  problem  arises  in  fields  as  diverse  as  geophysical  prospecting, 
mediccil  imaging,  image  processing,  groundwater  hydrology,  and  global  ocean  modeling.  While  it  is  not  difficult 
to  find  practical  instances  of  linear  inverse  problems,  it  is  often  quite  challenging  to  generate  their  solutions.  In 
many  instances,  regularization  is  required  to  overcome  problems  associated  with  the  poor  conditioning  of  the 
linear  system  relating  the  observations  to  the  underlying  function.  Even  if  the  problem  is  not  ill-conditioned,  a 
regularizer  may  be  incorporated  as  a  means  of  constraining  the  reconstruction  to  reflect  prior  knowledge  concerning 
the  behavior  of  this  function.  For  example,  it  is  common  practice  to  regularize  a  problem  so  as  to  enforce  a  degree 
of  smoothness  in  the  reconstruction.  Also,  in  disciplines  such  as  geology,  the  phenomena  under  investigation  are 
fractal  in  nature  in  which  case  a  prior  model  with  a  l//-type  power  spectrum  is  used  as  a  regularizer. 

In  addition  to  the  regularization  issue,  characteristics  of  the  data  set  available  to  the  inversion  algorithm 
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can  create  difficulties.  In  many  inverse  problems,  a  large  quantity  of  data  from  a  suite  of  sensors  is  available 
for  the  inversion;  however,  the  information  conveyed  by  each  measurement  process  may  be  far  from  complete 
so  that  one  is  confronted  with  the  problem  of  fusing  data  from  several  sources  to  achieve  the  desired  level  of 
performance  in  the  inversion.  Hence,  there  is  a  need  for  understanding  precisely  how  data  contributes  information 
to  a  reconstruction  and  the  manner  in  which  measurements  from  different  sources  are  merged  by  the  inversion 
routine.  Alternatively,  the  availability  of  the  data  often  is  limited.  For  example,  one  may  be  constrained  to 
collecting  measurements  on  the  boundary  of  a  region  while  the  quantity  of  interest  is  to  be  estimated  over  the 
interior.  Here,  one  requires  flexible  inversion  algorithms  capable  of  processing  data  possessing  sparse  or  limited 
spatial  distributions.  Additionally,  one  must  compensate  for  errors  present  in  the  data  which  may  arise  from 
noise  in  the  measurement  apparatus,  unknown  quantities  associated  with  the  experimental  conditions,  modeling 
errors  induced  by  the  simplification  of  physics  and  the  presence  of  nuisance  parameters  in  the  model.  Finally, 
one  must  be  concerned  with  the  computational  complexity  of  the  inversion  algorithm.  Typically,  the  inversion 
requires  the  solution  of  a  large  system  of  linear  equations  so  that  advantage  must  be  taken  of  any  structure  or 
sparseness  present  in  the  matrices  associated  with  the  problem. 

In  this  paper  we  develop  a  frariwwork  for  inversion  based  upon  a  multiscale  description  of  the  data,  the 
operators,  and  the  function  to  be  reconstructed.  The  inversion  algorithm  used  here  is  drawn  from  the  theory  of 
statistical  estimation.  Such  an  approach  allows  for  the  explicit  modeling  of  the  errors  in  the  data  as  sample  paths 
from  random  processes.  All  prior  information  regarding  the  structure  of  the  underlying  function  is  summarized 
in  the  form  of  a  statistical  model  which  also  acts  as  a  regularizer.  Moreover,  these  techniques  compute  not  only 
the  estimate  of  the  function  of  interest,  but  also  provide  a  built-in  performance  indicator  in  the  form  of  an  error 
covariance  matrix.  This  matrix  is  central  to  an  understanding  of  the  manner  in  which  information  from  a  set  of 
observations  is  propagated  into  a  reconstruction. 

We  utilize  a  1//  fractal  prior  model  specified  in  the  wavelet  transform  domain  for  the  purposes  of  regularization. 
While  clearly  not  the  only  multiscale  model  available  for  this  purpose,  the  1//  model  is  useful  for  a  number  of 
reasons.  First,  as  noted  in  [3],  this  model  produces  the  same  effects  as  the  more  traditional  smoothness  regularizers. 
Hence,  its  behavior  and  utility  are  well  understood.  Second,  as  noted  previously,  it  is  appropriate  to  use  a  model 
of  this  type  in  many  applications  where  the  underlying  process  possesses  a  fractal  or  self-similar  structure.  Finally, 
l//-fype  processes  assume  a  particularly  simple  form,  easily  implemented  in  the  wavelet  transform  domain. 

The  inversion  algorithms  developed  in  this  paper  are  unique  in  their  ability  to  overcome  many  of  the  data- 
oriented  difficulties  associated  with  spatial  inverse  problems.  Specifically,  our  techniques  are  designed  for  the 
processing  of  information  from  a  suite  of  sensors  where  the  sampling  structure  of  each  observation  process  may  be 
sparse  or  incomplete.  For  problems  with  a  shift  invariant  structure,  processing  such  data  via  traditional  Fourier 
techniques  typically  requires  the  use  of  some  type  of  space-domain  windowing  or  interpolation  method  which  tend 
to  cause  distortion  in  the  frequency  domain.  By  using  the  multiscale  approach  developed  here,  such  preprocessing 
is  unnecessary  thereby  avoiding  both  the  cost  of  the  operation  and  the  distortion  in  the  transform  domain. 

Given  this  ability  to  merge  data  from  a  variety  of  sources,  we  develop  a  quantitative  theory  of  sensor  fusion  by 
which  we  are  able  to  understand  how  information  from  a  suite  of  observations  is  merged  to  form  the  reconstruction. 
The  insight  provided  by  our  analysis  can  be  used  to  control  signal  processing  “greed”  by  defining  the  space-varying, 
optimal  scale  of  reconstruction  as  a  function  of  (1)  the  physics  relating  the  unknown  quantity  to  the  measurements 
and  (2)  the  spatial  coverage  and  measurement  quality  of  the  data  each  observation  source  provides.  In  general, 
this  approach  allows  for  the  recovery  of  fine  scale  detail  only  where  the  data  supports  it  while  at  other  spatial 
locations,  a  coarser  approximation  to  the  function  is  generated. 
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Figure  1;  Convolutional  Kernel  Functions 


*  ♦ 

2  PROBLEM  FORMULATION 


2.1  The  observations  processes 

In  this  paper,  the  the  data  upon  which  the  inversion  is  to  be  based,  yj,  are  related  to  the  function  to  be 
reconstructed,  g,  via  a  system  of  linear  equations  embedded  in  additive  noise.  Hence  the  observation  model  to 
be  considered  is 

yi  =  Tig  +  ni  i=l,2,...K  (1) 

with  yi,ni  £  and  g  £  Each  vector  yi  represents  the  measurements  associated  with  the  sensor  whose 
transfer  function  is  defined  by  the  matrix  T*.  The  components  of  yt  are  eissumed  to  be  samples  of  an  underlying 
continuous  observations  process,  yi{x),  where  x  represents  one  space  dimension. 

A  key  feature  of  the  modeling  structure  of  (1)  is  its  flexibility.  By  specifying  the  structure  of  the  kernels, 
multisensor  fusion  problems  can  be  described  wherein  the  data  from  individual  sources  conveys  information 
about  g  at  a  variety  of  spatied  scales.  In  Section  4,  a  two  channel  deconvolution  problem  is  considered.  The  kernel 
functions  in  this  case  are  denoted  Tf  and  and  are  plotted  in  Figure  1.  The  kernel  labeled  Tf  gives  essentially 
pointwise  observations  thereby  supplying  fine  scale  data  for  the  inversion.  Alternatively,  Tc  performs  a  local 
averaging  of  the  function  g  so  that  y^  provides  coarse  scale  information  regarding  the  structure  of  g.  Throughout 
this  paper,  the  subscript  /  and  c  are  used  to  denote  quantities  associated  with  the  fine  and  coarse  scale  processes 
respectively. 


2.2  A  wavelet  representation  of  g{x) 

A  multiscale  representation  of  g  is  obtained  via  the  use  a  wavelet  expansion.  The  elements  of  the  vector  g 
are  taken  to  be  a  collection  of  scaling  coefficients  at  shifts  n  =  1, 2, . .  at  some  finest  scale  of  representation. 
Mg.  The  wavelet  transform  of  g,  denoted  7,  is  composed  of  a  coarsest  set  of  scaling  coefficients,  g{Lg),  at  scale 
Lg  <  Mg  and  vectors  of  wavelet  coefficients  7(m)  for  scales  m  =  Lg,  Lg  +  1, . . Mg  —  1  [1,2].  Each  7(m)  then  is 
comprised  of  the  wavelet  coefficients  at  all  shifts  of  interest  at  scale  m  with  an  analogous  interpretation  holding 
for  the  elements  of  g{Lg).  Moreover,  7  is  obtained  from  g  via  the  orthonormal  transformation 

7  =  Wjg  (2) 

where  VVg  is  the  wavelet  transform  matrix  associated  with  a  particular  compactly  supported  wavelet  [5].  We 
choose  to  subscript  the  wavelet  transform  operator  here  as  Wj  to  make  explicit  that  this  is  the  transform  for  g. 
In  genercd,  we  may  use  different  wavelet  transforms  for  the  various  data  sets,  yi. 
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- -  Shift  - - 

Figure  2:  A  sample  lattice  structure  corresponding  to  a  D4  wavelet  transform.  The  finest  scale  is  taken  as  Mg 
whUe  the  coarsest  is  Lg. 


The  relationships  among  the  scale  space  component  in  the  decomposition  of  g  are  graphically  represented  in 
the  form  of  a  lattice  as  shown  in  Figure  2  for  the  case  of  a  wavelet  decomposition  based  upon  the  Daubechies 
4-tap  wavelet  [2].  At  the  finest  scale,  the  nodes  represent  the  finest  set  of  scaling  coefficients.  Each  node  at  all 
other  scales  contains  one  wavelet  and  one  scaling  coefficient.  A  coarse  scale  node  is  said  to  impact  a  finer  scale 
if  there  exists  a  strictly  downward  path  on  the  lattice  from  the  former  to  the  latter.  For  m  ^  Mg ,  we  define  the 
downward  impact  set,  V{m,  i),  associated  with  the  node  (m,  i)  (i.e.  the  node  at  scale  m  and  shift  i)  as  the  set  of 
finest  scale  nodes  which  this  node  ultimately  impacts.  Thus  in  Figure  2,  2?(n)  is  comprised  of  all  nodes  marked 
with  the  symbol 


2.3  Transformation  of  the  observation  equations  to  wavelet  space 

Equation  (1)  relates  the  finest  sccde  scaling  coefficients,  g,  and  the  samples  of  the  noise  processes  to  the  samples 
of  the  observation  process  j/,-.  For  the  purposed  of  the  inversion,  we  desire  a  relationship  between  the  wavelet 
transform,  j,  of  g  and  a  multiscale  representation  of  Ui  to  a  multiscale  representation  of  the  data.  Toward  this 
end,  we  define  a  discrete  wavelet  transform  operator  that  takes  the  vector  of  sampled  measurements,  ,  into  its 
wavelet  decomposition 

r,i  =  WiVi  =  WiTiWj  7  + 

=  ©<7  -f  Wi  (3) 

where,  r],  consists  of  a  coarsest  scale  set  of  scaling  coefficients,  j/i(2/,),  at  scale  Li  and  a  complete  set  of  finer  scale 
wavelet  coefficients  ‘t]i{m),  Li  <  m  <  Mi  —  where  is  the  finest  scale  of  representation. 

In  Table  1,  we  summarize  the  notation  that  we  will  use.  For  example,  for  the  data  yi,  the  corresponding 
wavelet  transform  r}i  =  WiPi  consists  of  wavelet  coefficients  r]i{m),  Li  <  m  <  Mi  —  1,  and  coarsest  scale  scaling 
coefficients  yi{Li).  Also,  if  we  form  only  partial  wavelet  approximations  from  scale  Li  through  scale  m,  the 
corresponding  scaling  coefficients  (which  are  obtained  from  yi{Li)  and  r]i{k),  Li  <  k  <m  —  1  [5])  are  denoted  by 
j/i(m).  We  adopt  the  analogous  notation  for  the  function  g  and  the  noise  rij. 

Finally,  it  is  often  useful  to  work  with  the  “stacked”  system  of  data  y  =  Tg+n  where  y  contains  the  information 
from  all  sensors  and  is  given  by 

y  =  [y^y^  ..-ylf 

T  =  [Tf  Tj 

n=  [nj  nf  ...n^-]’’. 

In  the  transform  domain,  the  corresponding  equation  is 

T)  =  &J+  u  (4) 

with  rj,  0,  and  u  are  defined  in  the  obvious  manner. 
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Quantity 

Wavelet  Transform 

Wavelet  Coefficients 

Scaling  Coefficients 

Data  Pi 

m  =  Wij/i 

77i(T77) 

yi(m} 

Function  9(35) 

7  =  Wgg 

7(777) 

9(m) 

Noise  77i 

l/i  =  W<77j 

l/<(777) 

ni(m) 

Table  1:  Notation  for  wavelet  and  scaling  coefficient  vectors 


3  Multiscale,  Statistical  Inversion  Algorithms 


3.1  A  maximum  a  posteriori  approach  to  inversion 

In  this  paper,  we  consider  the  maximum  a  posteriori  (MAP)  estimate  of  7  under  the  conditions  that  n  ~ 
R)  and  the  prior  model  for  7  is-  of  the  form  7  ~  ^(0,  Po).  For  Po  positive  definite,  the  MAP  estimate  is  [9] 

JMAP  =  +  Po^)-^Q'^R-^v-  (5) 

We  utilize  a  fractal- type  of  prior  model  recently  developed  by  Wornell  and  others  [10].  The  wavelet  coefficients 
of  g  are  independent  and  distributed  according  to  7(771,  n)  ~  Ar(0, where  7(771,77)  is  the  coefficient  at 
scale  777  and  shift  T7.  The  parameter  controls  the  overall  magnitude  of  the  process  while  p.  determines  the 
fractal  structure  of  sample  paths.  The  case  p  =  0,  corresponds  to  g  being  white  noise  while  as  p  increases,  the 
sample  paths  of  g  show  greater  long  range  correlation  and  smoothness.  In  addition  to  defining  the  scale-varying 
probabilistic  structure  of  the  wavelet  coefficients,  we  also  must  provide  a  statistical  model  for  the  coarsest  scale 
scaling  coefficients,  g{Lg,n).  Roughly  speaking,  these  coarse  scale  coefficients  describe  the  DC  and  low-frequency 
structure  of  g.  In  the  applications  we  consider  here,  we  assume  that  we  have  little  a  priori  knowledge  concerning 
the  long-term  average  value  of  g{x).  Consequently,  we  take  g{Lg,  m)  ~  A/’(0,  where  is  some  sufficiently 
large  number.  By  choosing  pL,  in  this  manner,  we  avoid  any  bias  in  the  estimator  of  the  low  frequency  structure 
of  g{x).  Thus,  we  have  that  7  ~  A'(0,  Pq)  where 

Po  =  block  diag{Po{Mg  -  1),  . . . ,  Po{Lg),  Po{Lg)) 

Po(m)  =  ?o(n7)  =  Pl,In,(l,) 

with  an  n  X  77  identity  matrix  and  Ng{m)  the  number  of  nonero  coefficients  in  the  wavelet  transform  of  g  at 
scale  777. 


3.2  The  relative  error  covariance  matrix 

A  key  advantage  of  the  use  of  statistical  estimation  techniques  is  the  ability  to  produce  not  only  the  estimate 
but  also  an  indication  as  to  the  quality  of  this  reconstruction.  Associated  with  the  MAP  estimator  is  the  error 
covariance  matrix,  P,  which  under  the  Gaussian  models  defined  in  Section  3.1  takes  the  form 

P=  (©’’p-^e-i-Po”^)"^  (6) 

The  diagonal  components  of  P,  the  error  variances,  are  commonly  used  to  judge  the  performance  of  the  estimator. 
Large  values  of  these  quantities  indicate  a  high  level  of  uncertainty  in  the  estimate  of  the  corresponding  component 
of  7  while  small  error  variances  imply  that  greater  confidence  may  be  placed  in  the  estimate. 

While  the  information  contained  in  P  is  important  for  evaluating  the  absolute  level  of  uncertainty  associated 
with  the  estimator,  in  many  cases,  it  is  more  useful  to  understand  how  data  serves  to  reduce  uncertainty  relative 
to  some  reference  level.  That  is,  we  have  some  prior  level  of  confidence  in  our  knowledge  of  7  and  we  seek  to 
comprehend  how  the  inclusion  of  additional  data  in  our  estimate  of  7  alters  our  uncertainty  relative  to  this  already 
established  level.  Toward  this  end,  we  define  the  relative  error  covariance  matrix  (RECM)  as 
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U{A,B)=I-P2^'^PbPa"'‘  (7) 

which  is  the  matrix  analog  of  the  scalar  l  —  hja  i.e.  the  relative  difference  between  two  real  number  a  and  h.  Here 
A  and  B  are  index  sets  with  A,BC.  {1, 2, . . Jf}.  The  quantity  Pa  (resp.  Pb)  is  the  error  covariance  matrix 
associated  with  the  MAP  estimate  t(A)  (resp.  7(S))  where  7(A)  (resp.  7(5))  is  the  estimate  of  7  based  upon 
data  from  all  observation  processes  r]i  with  i  £  A  (resp.  i  £  B.)  Finally,  we  define  the  error  covariance  matrix 
associated  with  no  observations,  P{0},  as  the  prior  covariance  matrix  Pq. 

In  the  event  Pa  is  diagonal,  the  diagonal  components  of  n(A,  B)  are  particularly  easy  to  interpret.  Let  cr?(A) 
be  the  error-variance  of  the  component  of  7  arising  from  an  estimate  based  upon  data  from  set  A.  Then, 
the  component  of  the  diagonal  of  n(A,  5)  is  just  1  —  a^{B)/a^{A)  which  is  nothing  more  than  the  relative 
size  difference  of  the  error- variance  in  the  component  of  7  based  upon  data  from  sets  A  and  B.  Note  that 
the  diagonal  condition  of  Pa  is  met  in  this  paper  when  Pa  =  Po-  Thus,  the  diagonal  elements  of  n({0},  B) 
represent  the  decrease  in  uncertainly  due  to  the  data  from  set  B  relative  to  the  prior  model.  Where  there  will  be 
no  confusion,  we  shall  abuse  notatidit  and  write  n({0},  B)  as  n(B). 

The  quantity  n(A,  B)  represents  a  useful  tool  for  quantitatively  analyzing  the  relationship  between  the  char¬ 
acteristics  of  the  data  (as  defined  by  ©  and  R)  and  the  structure  of  the  estimate  7.  Consider,  for  example, 
the  case  in  which  we  wish  to  assess  the  overall  value  of  a  set  of  sensors.  That  is,  suppose  that  A  =  0  and 
B  =  {any  set  of  sensors}  so  that  n(A,  B)  =  n(5)  measures  the  contribution  of  the  information  provided  by  this 
set  of  sensors  relative  to  that  of  the  prior  model.  We  begin  by  defining  n™(S)  as  the  value  of  the  element  on  the 
diagonal  of  the  matrix  11(B)  corresponding  to  the  wavelet  coefficient  at  scale/shift  (m,  n).  To  avoid  ambiguity, 

we  use  the  notation  Iln"  to  refer  to  the  RECM  information  for  the  coarsest  scaling  coefficient  of  g  at  shift  n.  If 
njj‘(5)  is  large  then  the  data  provides  considerable  information  regarding  the  structure  of  g  at  (m,  n).  In  partic¬ 
ular,  this  quantity  provides  us  with  a  natural  way  in  which  to  define  the  scale  to  which  g  should  be  reconstructed. 
Consider  the  finest  scale  of  our  representation,  the  scaling  coefficients  g{Mg,j).  Using  the  terminology  introduced 
in  Section  2.2,  we  say  that  the  data  supports  a  reconstruction  of  g(Mg,j)  at  scale  m  if  there  exists  some  node  in 
the  wavelet  lattice  of  g  at  scale  m  which  satisfies  the  following 

1.  The  node  impacts  g(Mg,j)  (i.e.  for  some  shift  n,  g(Mg,j)  £  D(m,  n)). 

2.  The  data  provides  a  sufficiently  large  quantity  of  information  regarding  the  structure  of  g  at  node  (m,  n) 
(i.e.  n™(B)  is  in  some  sense  large). 

Clearly,  the  finest  level  of  detail  supported  by  a  data  set  at  shift  j,  denoted  m*{j),  is  the  finest  scale  for  which 
a  node  (m,  n)  may  be  found  that  satisfies  the  above  two  criteria  and  in  general  is  a  function  of  position  (i.e. 
a  function  of  the  shift  j  at  scale  Mg.)  The  precise  quantification  of  “sufficiently  large”  will  depend  upon  the 
particular  application  and  on  the  structure  of  the  particular  inverse  problems  under  investigation. 

In  addition  to  its  use  is  assessing  the  scale  of  reconstruction  supported  by  the  information  from  a  set  of  sensors, 
if  we  consider  the  case  where  neither  A  nor  B  is  empty,  we  find  that  there  are  several  ways  in  which  n(A,  B)  may 
be  of  use  in  assessing  the  value  of  fusing  information  from  multiple  sensors  and  in  identifying  how  this  fusion  takes 
place.  For  example,  if  A  C  5,  then  n(A,  B)  provides  us  with  a  measure  of  the  value  of  augmenting  sensor  set  A  to 
form  sensor  set  B.  Roughly  speaking,  if  n(A,  B)  is  significantly  larger  than  0,  there  is  a  benefit  in  the  additional 
information  provided  by  the  sensors  in  5  —  A.  Moreover,  we  can  use  the  quantities  HJJ*  (A,  B)  to  pinpoint  the 
scales  and  locations  at  which  this  fusion  has  significant  benefit  i.e.,  those  scales  and  shifts  at  which  active  sensor 
fusion  is  taking  place.  Furthermore,  by  varying  the  sets  A  and  B,  we  can  identify  which  sensors  are  actively 
used  to  obtain  that  estimate.  That  is,  for  each  (m,  n),  we  can  in  principal  find  the  set  A  C  {1,  ... ,  K}  so  that 
n™(A,  {1,  . . . ,  K})  is  small  (so  that  sensors  not  in  A  provide  little  additional  information  to  the  reconstruction 
of  wavelet  coefficient  (m,  n))  and  so  that  for  any  C  C  A,  n™(C,  A)  is  of  significant  size  (so  that  all  of  the  sensors 
actively  contribute  to  the  reconstruction  at  this  scale  and  shift.) 
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Property 

Value 

Wavelet 

Daubeschies  6  tap 

Finest  Scale  (Mg ) 

7 

Coarsest  Scale  (Lg ) 

3 

P 

2.0 

10 

Pl, 

0.25 

(b)  ParBmcter  values  for 


(b)  Fractal  function  to  be  reconstructed.  Ap¬ 
proximation  coefficients  at  scale  Mg  =  7. 


Figure  3;  The  finest  scale  approximation  coefficients  of  g  and  the  parameter  values  for  this  function. 


4  Examples 


As  described  previously,  our  multiscale,  stochastic  methodology  provides  a  natural  ftamework  for  addressing 
a  variety  of  issues  arising  in  the  study  of  linear  inverse  problems  including 

•  Regularization 

•  Multisensor  data  fusion 

•  The  processing  of  data  possessing  sparse  or  irregular  sampling  patterns 

•  Defining  the  optimal,  space  varying  scale  of  reconstruction 

In  this  section,  we  anrdyze  one  example  demonstrating  may  of  the  above  points.  A  more  extensive  set  of  illustra¬ 
tions  may  be  found  in  [5]  and  in  the  talk  accompanying  this  paper.  Here,  we  consider  a  two  channel  deconvolution 
problem  with  kernel  function  Tc  and  T/  shown  in  Figure  1.  The  function  to  be  reconstructed  is  a  sample  path  of 
1//  of  process  defined  by  the  parameters  in  Figure  3(a)  and  is  displayed  in  Figure  3(b). 

A  common  characteristic  of  linear  inverse  problems  is  the  desire  to  estimate  g  over  some  closed  and  bounded 
region  based  upon  measurements  some  of  which  are  available  only  at  or  near  the  boundary  of  this  region.  As 
discussed  in  Section  1  for  problems  with  a  convolutional  structure,  such  a  distribution  of  data  points  makes  the 
use  of  Fourier-based  techniques  problematic.  In  contrast,  the  multiscale,  statistical  MAP  inversion  algorithm  we 
have  described  is  ideally  suited  to  handling  such  problems.  To  illustrate  this,  we  consider  a  configuration  of  the 
two  channel  deconvolution  problem  in  which  y/  is  available  only  near  both  ends  of  the  interval  while  yc  is  sampled 
over  the  entire  interval.  In  this  case,  the  noiseless  and  noisy  data  sets  are  shown  in  Figure  4.  The  signal-to-noise 
ratio  (SNR)  for  each  process  is  3.  The  SNR  of  the  vector  r)i  =  ©<7  4-  Vi  with  Vi  ~  ^f{0,  r?J)  and  7  ~  Pq)  is 
defined  as 

SNR^  -  Ppw”  _  tr{QiPo&f) 

'  Power  per  pixel  in  Ui 

where  Ng  is  the  length  of  the  vector  7  and  tr  is  the  trace  operation. 

The  sampling  structure  associated  with  y/  is  handled  quite  easily  using  wavelet  transforms.  Specifically,  we 
split  yy  into  its  left  and  right  components  and  treat  each  separately.  In  effect,  this  is  equivalent  to  windowing 
yy  and  applying  Wy  individually  to  each  windowed  version  of  the  data.  We  note  that  unlike  Fourier  techniques 
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where  space-domain  windowing  can  cause  significant  distortion  of  the  signal  in  the  frequency  domain,  no  significant 
distortion  is  present  here^. 

The  estimates  of  g  are  displayed  in  Figure  5.  We  see  that  over  the  middle  of  the  interval,  g{{f,  c})  is  roughly  the 
same  as  ff({c})  while  at  either  end,  information  from  yf  is  used  almost  exclusively  in  the  inversion.  Additionally, 
Figure  5  shows  that  given  only  y/,  the  estimator  does  make  an  attempt  to  recover  g  over  the  interior  of  the 
interval,  but  such  an  estimate  is  increasingly  in  error  the  farther  one  proceeds  toward  the  middle. 

In  Figure  6(a)-(d),  the  diagonal  components  of  n(J?)  are  plotted  for  B  C  {{/})  {c},  {/,  c}}  and  for  scales^ 

3  and  4.  We  observe  that  for  scale-shift  pairs  (m,  n)  interior  to  the  boundary  region  in  which  fine  scale  data 
are  available,  n™({/})  is  essentially  zero  indicating  the  almost  complete  lack  of  information  in  yf  about  g  over 
these  shifts.  However,  for  pairs  (m,  n)  corresponding  to  locations  near  either  boundary,  information  in  yf  almost 
completely  dominates  that  in  yc.  In  Figures  6(d),  the  utility  of  adding  j/c  to  an  estimate  based  upon  yf  is 
illustrated  by  displaying  n®({/},  {/,  c}).  Again  the  contribution  of  the  coarse  scale  data  is  greatest  away  from 
the  end  of  the  interval.  In  Figures  6(b)  and  (c),  we  observed  the  presence  of  active  sensor  fusion  over  selected 
shifts  at  these  scale.  That  is  for  cert£n  n  and  for  j  6  {3,4},  n^({/,  c})  is  significantly  larger  that  both  ni({c}) 
and  ({/}).  Thus,  the  RECM  is  able  to  localize  both  in  sc^e  and  in  shift  the  precise  locations  where  the 
presence  of  both  data  sets  yields  significantly  more  information  than  either  alone.  Finally,  for  scales  other  than 
3  and  4,  the  two  observation  sources  provide  little  if  any  significant  information  to  the  reconstruction  of  g. 

For  the  case  considered  here,  define  the  shift-varying  optimal  scale  of  reconstruction  given  both  yc  and  yf  in 
the  following  manner.  The  diagonal  structure  of  Pq  implies  that  0  <  n™(A)  <  1  so  that  determining  whether 
n”(A)  is  “sufficiently  large”  is  accomplished  by  comparing  this  quantity  to  some  threshold,  r,  between  zero  and 
one.  Thus,  m*{j),  the  finest  scale  of  detciil  supported  in  a  reconstruction  at  scale  Mg  and  shift  j,  is  the  largest 
m  such  that  there  exists  a  shift  j  for  which  (1)  g{Mg,  n)  €  'D{m,j)  and  (2)  n™(A)  >  r.  Given  this  procedure  for 
determining  the  optimal  scde  of  reconstruction,  we  are  led  to  define  -yr,  a  truncated  version  of  7,  as  follows: 

_/0  n™(^)<r 

y-r  (m,n)  |  otherwise 

where  [7](m,n)  is  the  component  in  the  vector  7  at  scale  m  and  shift  n.  Defining  jr  in  this  way  ensures  that 
ffr  =  W^7t  is  in  fact  the  reconstruction  of  g  which  at  each  shift  j  contains  detciil  information  at  scales  no  finer 
than  m*{j). 

In  Figure  7(a),  we  plot  m*{j)  using  the  noisy  data  sets  of  Figure  4  for  r  =  0.45.  Here  we  see  that  near 
the  boundaries,  the  presence  of  fine  scale  data  allows  for  higher  resolution  in  the  reconstruction  of  g  while  in 
the  middle  of  the  interval,  we  must  settle  for  a  coarser  estimate.  From  Figure  7(b)  we  see  that  there  is  little 
difference  between  the  optimal  estimate,  g,  and  its  truncated  version,  ffo.45-  Thus,  the  relative  error  covariance 
matrix  ancilysis  can  be  used  to  evaluate  a  particular  parameterization  of  g.  Given  the  structure  of  the  observation 
processes,  we  see  that  g  is  overparameterized  as  the  data  provide  little  useful  fine  scale  information  relative  to 
that  found  in  the  prior  model.  Any  attempt  to  recover  these  components  of  y  is  effectively  a  waste  of  time  and 
computational  resources.  Rather,  the  RECM  suggests  that  a  more  parsimonious  description  of  g  is  warranted 
and  even  indicates  how  such  a  model  should  be  constructed  based  upon  the  information  available  in  the  data. 
That  is,  given  the  structure  of  the  observation  processes,  the  original  parameterization  of  g  involving  256  degrees 
of  freedom  is  clearly  excessive.  Rather,  the  data  dictates  that  for  r  =  0.45  at  most  only  24  parameters  (i.e.  the 
number  of  nonzero  elements  of  yo.45)  need  be  estimated. 

'The  only  distortion  is  caused  by  the  edge  effects  arising  from  the  circulant  implementation  of  the  wavelet  transform  as  discussed 
in  Section  2.2  and  as  we  have  discussed,  these  effects  are  generally  negligible  or  can  be  overcome  completely  through  the  use  of 
modified  wavelet  transforms  obtained  over  compact  intervals 

^The  imusual  activity  at  the  right  hand  edge  of  these  plots  is  an  artifact  of  the  circulant  implementations  of  wavelet  transform 
operator  Wj  [5] 
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(b)  Noiseless  (solid  line)  and  noisy  (dashed 
line)  versions  of  yf.  SNRf  =  3 


(b)  Noiseless  (solid  line)  and  noisy  (dashed 
line)  versions  of  yc.  SNRc  =  3 


Figure  4:  Data  sets  for  use  in  reconstruction  with  the  SNRf  =  SNRc  =  3  and  yf  available  only  near  the  end  of 
the  interval. 


(»)  9({/>c})  (solid  line)  versus  0({/})  (b)  a({/,c})  (solid  line)  versus  a({c}) 

(dashed  line)  (dashed  line) 


Figure  5:  Estimates  of  g  using  various  combinations  of  yf  and  yc  for  the  case  where  SNRf  =  SNRc  =  3  and  yf 
is  available  only  near  the  edges  of  the  interval.  We  see  that  at  the  boundaries,  the  estimate  given  both  yc  and 
yf  essenticdly  makes  use  only  of  yf.  Over  the  center  of  the  interval  where  yf  is  absent,  5({/,  c})  follows  g({c}) 
closely. 
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(b)  Solid  lines  =  nj^({/,c}).  Daslud  lines 
=  n®  ({/}).  Dot-dashed  lines  =  11®  ({c}). 


(b)  Solid  lines  =  H®  ({l,c}).  Dashed  lines 
=  n5^({/})-  Dot-dashed  lines  =  nj^({c}). 


(c)  Solid  lines  =  n^({/,  c}).  Dashed  lines 
=  nj^({/}).  Dot-dashed  lines  =  n^({c}). 


(d)  n®({/}.{/.c}). 


Figure  6:  Relative  error  covariance  information  for  the  case  of  SNRf  =  SNRe  =  3  with  yf  available  only  near 
the  ends  of  the  interval.  For  scales  3  and  4,  (a)-(c)  indicate  that  at  the  ends  of  the  interval,  the  variance  reduction 
given  both  yf  and  yc  is  equal  to  that  given  only  yj.  Alternatively,  yc  impacts  the  RECM  data  primarily  in  the 
middle  of  the  interval.  In  (a)-(c),  there  is  some  active  sensor  fusion  taking  place  as  there  exists  shifts  at  these 
scales  for  which  Ilf  ({/,  c})  dominates  both  Ilf  ({/})  and  nf({c}).  From  (d),  it  is  observed  that  j/c  has  significant 
impact  relative  to  yf  in  lowering  the  variance  of  the  coarsest  scaling  coefficient  estimates  at  shifts  away  from 
either  end  of  the  interval. 
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(e)  The  space-varying,  optimal  scale  of  re-  (b)  Plot  of  g  (solid  line)  versus  go.ss 
construction  for  t  =  0?45  given  (1)  the  (dashed  line) 
complete  set  of  data  j/c  and  (2)  the  fine 
scale  data  yf  near  either  end  of  the  interva 


Figure  7:  The  optimal  scale  of  reconstruction  as  a  function  of  position  for  r  =  0.45  and  the  corresponding  estimate 
9r- 


5  Conclusions  and  Future  Work 


In  this  paper,  we  have  presented  an  approach  to  the  solution  of  linear  inverse  problems  based  upon  techniques 
drawn  from  the  fields  of  multiscale  modeling,  wavelet  transforms,  and  statistical  estimation.  This  formulation  is 
particularly  useful  in  describing  the  situation  where  there  exists  a  suite  of  measurements  each  of  which  conveys 
information  about  the  behavior  of  g  on  different  scales.  We  utilize  wavelet  methods  to  transform  the  problem 
from  real-space  to  scale-space.  A  maximum  a  posteriori  (MAP)  estimator  serves  cis  the  inversion  eJgorithm  and 
produces  an  estimate  not  of  g,  but  of  its  wavelet  transform,  7.  Regularization  is  achieved  via  a  statistical  model 
of  7  which  also  provides  a  means  of  capturing  any  available  prior  information  regarding  the  structure  of  g. 

Our  approach  makes  extensive  use  of  scale-space  in  the  analysis  of  linear  inverse  problems.  By  introducing  the 
notion  of  a  relative  error  covariance  matrix  (RECM),  we  have  developed  a  quantitative  tool  for  understanding 
quite  precisely  the  various  ways  in  which  data  from  a  multitude  of  sensors  contribute  to  the  final  reconstruction  of 
g.  Via  our  two  channel  deconvolution  example,  we  have  demonstrated  a  method  for  determining  the  optimal  level 
of  detail  to  include  in  the  estimate  of  gf  as  a  function  of  spatial  location.  The  incremental  benefits  associated  with 
the  addition  of  data  from  another  sensor  was  readily  explored  using  the  RECM.  Also,  we  have  shown  the  utility 
of  this  quantity  in  describing  the  process  of  multisensor  data  fusion  in  a  wavelet  setting  and  in  evaluating  the  the 
level  of  complexity  supported  in  a  representation  of  g  based  upon  the  information  content  of  the  observations. 
Finally,  in  addition  to  performing  the  RECM  analysis,  our  examples  highlight  the  ability  of  a  wavelet-based 
approach  to  handle  non-fuU  data  sets.  Specifically,  we  have  considered  the  case  where  one  source  of  information 
was  available  only  near  the  boundaries  of  the  interval. 

We  note  that  the  general  methodologies  presented  here  are  not  restricted  to  the  ID  deconvolution  problems. 
Our  techniques  can  be  used  without  alteration  for  one  dimensional  problems  involving  non-convolutional  kernels. 
Also,  the  extension  of  our  approach  to  multidimensional  inversions  can  be  accomplished  quite  easily  and  should 
be  of  great  use  in  the  analysis  and  solution  of  2D  and  3D  problems  which  typically  exhibit  more  severe  forms  of 
all  the  difficulties  found  in  the  ID  case.  Indeed,  in  [6],  we  consider  a  non-convolutional  2D  inverse  conductivity 
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problem  similar  to  those  found  in  geophysical  exploration. 

Although  not  considered  extensively  in  this  work,  the  multiscale,  statistically  based  inversion  algorithms 
admit  highly  efficient  implementations.  As  discussed  by  Beylkin  et.  al  in  [1] ,  wavelet  transforms  of  many  operator 
matrices,  ©,  contain  very  few  significant  elements  so  that  zeroing  the  remainder  lead  to  highly  efficient  algorithms 
for  applying  0  to  arbitrary  vectors.  These  sparseness  results  imply  that  the  least-squares  problems  defined  by 
the  wavelet-transformed  normal  equations  also  have  a  sparse  structure.  Thus  computationally  efficient,  iterative 
algorithms  such  as  LSQR  [7]  can  be  used  to  determine  7.  In  [4],  we  utilize  the  theory  of  partial  orthogonalization  [8] 
in  the  development  of  a  modified  form  of  LSQR.  Our  algorithm  is  designed  for  the  efficient  and  stable  computation 
of  7  as  well  as  arbitrary  elements  in  the  error  covariance  and  relative  error  covariance  matrices. 
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